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Foreword 


This  report  was  prepared  by  the  University  of  California  In  fulfill¬ 
ment  of  the  requirements  of  Contract  F49620-84-C-0090  with  the  Air  Force 
Office  of  Scientific  Research.  The  project  manager  for  this  contract  was 
Capt.  John  Thomas  of  AFOSR. 

The  technical  work  was  performed  during  the  period  August  1984  through 
July  1985.  The  technical  efford  Involved  development  of  simple  and  block 
Lanczos  algorithms  at  the  University  of  California  and  testing  of  these 
algorithms  for  modal  analysis  of  typical  structural  analysis  problems  at 
Lockheed  Missiles  and  Space  Company,  Inc.  Work  at  the  University  of 
California  was  carried  out  by  Ors.  T.  Ericsson,  B.  Nour-Omid  and  Professor 
B.N.  Parlett  under  the  direction  of  Prof.  Parlett.  Work  at  Lockheed  Missiles 
and  Space  Company  was  carried  out  by  Paul  S.  Jensen  under  contract  to  the 
University  of  California. 

One  report  i  s  being  prepared  as  a  result  of  this  research  for  pub¬ 
lication  in  Math.  Comp.  It  is  entitled  "How  to  Implement  the  Spectral 
Transformation,"  by  all  four  participants  in  this  project. 

The  contract  was  approved  too  late  to  provide  support  for  Or.  Nour-Omid 
who  left  the  Center  for  Pure  and  Applied  Math.  In  April  1984.  The  surplus 
funds  went  in  partial  support  of  Dr.  Ericsson. 


INTRODUCTION 


B.N.  Parlett 
Aug.  6,  1985 


Research  over  the  past  9  years  at  the  University  of  California  has 


resulted. in  a  number  of  reports. 


\ 


11-10]  on  the  general  theory  of  the  Lanczos  algorithm  for  large  symmetric 
eigenproblems.  This  work  was  extended  4a— lS^,to  apply  to  the  large  general¬ 
ized  symmetric  eigenanalysis  problem.  The  pyrpose~o£  this  effortf“wa*-t£  Im¬ 


plement*  the  results  of  the  past  research  for  application  to  large  generalized 


— > 


i  J 


problems  arising  in  structural  analysis. 

1 .0  Simple  Lanczos 

The  first  implementation  developed  was  a  simple  Lanczos  algorithm  based 
on  selective  orthogonal ization  [3  and  9]  and  a  new  technique  [11]  for  moni¬ 
toring  eigenvalues  of  the  intermediate  tridiagonal  matrices  Tj. 

These  are  our  approximate  eigenvalues.  At  the  same  time  we  get  error 
bounds.  By  monitoring  at  every  step  the  Lanczos  run  may  be  terminated  as 
early  as  possible.  This  poses  the  challenge  of  carrying  out  this  monitoring 
so  effectively  that  it  increases  the  cost  of  a  Lanczos  step  by  less  than  5t, 
preferably  by  less  than  1%.  This,  in  turn,  means  updating  a  few  eigenvalues 

of  a  j  x  j  tridiagonal  in  0{j)  arithmetic  operations.  For  comparisons  we 

2 

note  that  the  EISPACK  program  IMTQL1  requires  about  9j  to  compute  all 
eigenvalues.  Our  solution  is  given  in  [11] 


After  considerable  testing  on  the  local  VAX/UNIX  system  this  simple 
Lanczos  Implementation  LANSO  Mas  sent  to  Lockheed  and  Mas  up  and  running 
by  Sept.  1984  (See  Sec. 2).  By  October  troubles  became  apparent.  For  short 
runs  of  20/25  steps,  picking  up  7  or  8  eigenvalues,  all  seemed  Mell.  However 
the  code  regularly  malfunctioned  a  little  later,  certainly  before  Step  50. 

A  serious  tactical  error  In  the  code  Mas  the  decision  to  monitor  not 
more  than  8  unconverged  eigenvalues  per  step.  This  value  seemed  effective 
In  the  tests  at  Berkeley.  It  mbs  only  In  June  1985  that  Me  realized  that 
this  celling  mbs  provoking  most  of  the  difficulties  with  LANSO.  The  easy 
change  from  8  to  16  permitted  runs  of  up  to  100  steps.  It  so  happened  that 
6  slowly  converging  eigenvalues  blocked  from  view  eigenvalues  that  were  con¬ 
verging  rapidly. 

Here  is  the  great  value  of  this  collaborative  research  effort.  With¬ 
out  Lockheed's  difficult  problems  this  design  defect  would  not  have  been 
uncovered. 

1.1  Blcok  Lanczos 

The  block  Lanczos  Implementation  BLANSO  was  sent  to  Lockheed  in 
October  1984  and  was  running  by  January,  1985. 

All  but  one  aspect  of  the  simple  Lanczos  program  generalize  readily, 
but  expensively,  to  the  block  case.  The  intermediate  matrix  T  is  now  block 
tridiagonal  and  it  is  proving  difficult  to  generalize  the  program  ANALYZET 
for  monitoring  the  convergence  of  T's  eigenvalues.  As  an  interim  measure 
we  are  using  EISPACK  codes  but  these  are  too  general  and  too  expensive  for 
long  runs.  Not  only  are  eigenvalues  needed  but  the  last  section  of  the 
associated  eigenvectors. 

Our  research  on  this  topic  is  described  by  Dr.  T.  Ericsson  In  Section  3. 


Our  investigations  revealed  an  Interesting  problem  to  which  no  attention 


has  been  given  tin  the  literature.  What  happens  when  the  Inertia  Matrix  M 
Is  positive  semi-definite  but  deficient  In  rank?  This  feature  of  M  occurs 
quite  frequently,  especially  in  the  analysis  of  structures  Intended  for 
service  In  space.  We  had  assumed  that  no  difficulties  would  arise.  Indeed 
If  all  vectors  are  projected  onto  the  subspace  associated  with  the 
finite  eigenvalues  then  no  difficulties  do  arise.  What  we  realized,  albeit 
gradually,  is  that  every  vector  Is  contaminated  by  components  In  the  null 
space  of  M  and  these  components  grow  steadily.  One  reason  that  this  phe¬ 
nomenon  may  have  lain  dormant  for  so  long  Is  that  It  Is  difficult  to  detect. 
All  our  usual  measures  Involve  M  and  look  fine  because  M  blanks  out  these 
unwanted  components.  A  consequence  of  this  feature  Is  that  computed  results 
turn  out  to  be  less  accurate  than  expected.  The  phenomenon  Is  not  special 
to  our  codes.  It  will  affect  all  Implementations  based  on  direct  Iteration. 
There  is  a  simple  way  to  cure  this  misbehavior  but  It  Is  very  expensive. 

One  simply  orthogonal izes  the  computed  eigenvectors  against  the  null  space 
of  M  but  using  the  K- inner  product. 

However  we  have  found  a  very  inexpensive  procedure  and  it  seems  to  be 
satisfactory.  At  each  step  there  is  a  residual  vector  that  is  not  used  ex¬ 
plicitly  in  the  Lanczos  process.  This  vector  will  become  the  next  Lanczos 
vector.  It  turns  out  that  a  modification  used  by  Ruhe  and  Ericsson  for 
another  purpose  does,  indirectly,  suppress  the  unwanted  components  in  the 
computed  Ritz  vecotrs.  The  block  version  is  mentioned  in  §§1.2  of  Section  2 
(ly  P.S.  Jensen). 

All  this  is  a  somewhat  unexpected  dividend  of  our  investigations. 


1.3  Termination  Criteria. 

As  Me  began  to  make  longer  Lanczos  runs  Me  found  that  the  convergence 
rate,  on  our  cases.  Mas  remarkably  good.  In  one  run  of  100  Lanczos  steps 
68  eigenvalues  converged  to  Morklng  (double)  precision.  We  had  previously 
thought  that  a  2:1  ratio  Mas  Impressive. 

This  perception  brings  to  the  fore  a  difficult  decision  that  users  must 
make.  For  hoM  many  steps  should  a  Lanczos  Iteration  be  continued  before  a 
neM  shift  Is  chosen  anJ  a  new  factorization  of  K-(shift)  M  Is  made? 

Ericsson  and  Ruhe  I  5  .]  favored  many  short  runs  (of  length  20)  but  our  in¬ 
vestigations  suggest  the  opposite.  It  Is  cost  effective  to  keep  going  while 
ever  Ritz  values  are  stabilizing  at  a  good  rate.  More  work  is  needed  here. 
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Section  2.  TEST  RESULTS  FOR  THE  LANCZOS  ALGORITHM  STUDY 


Paul  S.  Jansen  29  July  1985 


This  study  concerns  two  computer  implementations  of  the  Lenczos 
algorithm  developed  under  AFOSR  contract  F49620-84-C-0090  at  the 
University  of  California.  The  first  implementation,  called  LARSO. 
carries  out  a  simple,  single  vector  Lanczos  iteration.  The  second 
implementation,  called  BLANSO.  Carries  out  block  Lanczos  iteration. 
The  objective  of  this  study  was  to  modify  these  implementations  as 
needed  in  order  to  adapt  them  to  the  basic  eigenanalysis  system  BES 
used  at  Lockheed  Missiles  and  Space  Company  for  general  eigen- 
analysis  of  large  structural  models .  The  resulting  computer 
programs  were  then  to  be  used  to  calculate  eigenpairs  for  several 
structural  engineering  models  in  order  to  evaluate  LAI1S0  and  BLANSO. 

Both  LANSO  and  BLAIISO  were  adapted  to  BES  and  test  studies 
were  carried  out  on  both  VAX  11/780  and  Cray  IS  computers.  The 
tests  on  the  VAX  computer  were  used  primarily  to  determine,  develop 
and  check  the  required  modifications  of  the  various  codes.  The  test 
problems  used  on  the  VAX  were  relatively  small  (under  500  equations) 
and  are  not  discussed  extensively  in  this  report.  All  calculations 
in  this  study  were  done  in  single  precision. 

1.  CODE  MODIFICATIONS. 

The  basic  eigenanalysis  system  BES  is  a  collection  of  programs 
and  subroutines  that  interface  with  several  structural  analysis 
programs  for  modal  and  buckling  analysis.  The  interface  is  through 
a  general  global  database  GAL  as  shown  in  Fig.  1.  Besides  the 
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global  database,  BES  also  uses  a  local  working  database  VMSYST  which 
is  based  on  a  virtual  memory  concept.  The  fundamental  eigen* 
analysis  algorithm  used  in  BES  is  a  block  Lanczos  implementation 
originally  developed  by  Prof.  David  Scott  of  the  University  of 
Texas.  This  implementation  is  combined  with  a  spectral  shifting 
algorithm  developed  by  Paul  Jensen  of  Lockheed. 

For  the  present  study,  BES  was  divided  into  three  program 
modules:  (1)  Root  code,  (2)  Kernel  code,  and  (3)  Interface  code,  as 
shown  in  Fig.  2.  The  largest  module,  the  root  code,  contains  all 


Fig.  2.  Partition  of  BES  for  Adaptation 

utility  software  for  data  management,  sparse  matrix  processing, 
spectral  shift  processes,  error  bound  calculations  and  miscellaneous 
processes  not  directly  associated  with  Lanczos  iteration.  The 
kernel  code  contains  all  software  developed  specifically  for  Lanczos 
iteration  and  the  interface  code  provides  the  interface  between  the 
root  code  and  the  kernel  code.  With  this  arrangement,  a  new  kernel 
and  interface  could  readily  be  constructed  for  the  simple  Lanczos 
program  LAIISO  from  the  University  of  California.  In  addition,  a  new 
block  Lanczos  implementation  called  KLEAII  (Kernel  for  Lanczos  Eigen 
Analysis)  was  also  adapted  to  BES.  KLEAN  is  a  modification  of  the 
BLANSO  code  developed  by  Dr.  Nour-Omid  for  this  study. 

1.1  KERNEL  MODIFICATIONS. 

A  fundamental  limitation  of  both  LANSO  and  BLANSO  is  the 
assumption  that  all  of  the  calculated  eigenvectors  can  be  held  in 


RAM  (random  access  main  storage.)  This  assumption  restricts  the 
size  of  problems  that  can  be  solved  and  can  result  in  unnecessarily 
early  termination  due  to  exhausted  RAM  space.  It  can  also  adversly 
affect  analysis  costs  levied  by  some  computer  billing  algorithms 
that  penalize  excessive  use  of  RAM. 

Elimination  of  this  restriction  from  LAHSO  would  be  rather 
involved  and  was  not  attempted  in  this  effort.  The  difficulty  stems 
from  the  use  of  selective  orthogonalization,  which  requires  periodic 
determination  of  eigenvectors  (Ritz  vectors)  during  the  iteration. 
BLAI1S0.  on  the  other  hand,  uses  partial  reorthogonlization  on  the 
Lanczos  vectors,  and  calculates  the  eigenvectors  only  once  at  the 
end.  Modifications  included  elim-  ination  of  the  final  eigenvectors 
calculation,  relegating  this  post  iteration  task  to  utility  programs 
already  available  in  BES.  The  resulting  code,  called  KLEAH, 
produces  only  eigenvalues  with  error  bounds,  eigenvectors  of  the 
reduced  (projected)  problem,  and  Lanczos  vectors. 

Even  with  these  modifications,  the  RAM  requirements  of  KLEAN 
are  considerably  greater  than  those  of  Scott’s  code.  A  major 
contributor  to  this  problem  is  the  desire  to  have  space  for  all  of 
the  reduced  eigenvectors.  This  requirement  can  readily  be  reduced 
to  sufficient  space  for  the  maximum  number  of  solution  vectors 
requested  for  a  given  analysis.  This  reduction,  which  would  require 
a  different  solver  for  the  reduced  problem,  was  not  implemented  in 
this  study. 

BES  is  designed  to  determine  eigenvalues  in  a  specified  section 
of  the  problem  spectrum.  This  imposes  additional  requirements  on 
the  termination  criteria  used  by  LANSO  and  KLEAII.  The  iteration 
should  terminate  when  one  or  more  of  the  following  conditions  are 
satisfied: 

1.  The  number  of  Lanczos  steps  reaches  a  specified  limit 

2.  The  number  of  converged  eigenvectors  reaches  a  specified  limit 

3.  All  of  the  eigenpairs  with  eigenvalues  in  the  specified  range 
have  converged. 

In  cases  1.  and  2.,  the  "completed  section"  of  the  eigenvalue  spectum 
for  which  all  eigenpairs  have  converged  should  be  returned.  This 
section,  of  course,  will  be  a  subsection  of  the  one  specified 
initially . 

There  is  no  fool-proof,  practical  method  for  specifying  the 
completed  section  or  determining  that  case  3  has  been  satisfied. 
Fortunately,  the  method  does  not  have  to  be  fool-proof  because  BES 
has  other  checks,  based  on  spectral  shifting,  that  insure  that  the 
requested  eigenpairs  are  found.  The  cost  for  failure  to  terminate 
the  iteration  properly  according  to  the  above  specifications  is 
extra  computing  time.  Therefore,  we  need  an  efficient  termination 
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method  that  works  most  of  the  tins. 

The  natural  order  for  eigenpairs  to  converge  in  BES,  using 
either  Lanczos  or  subspace  iteration,  is  to  start  near  the  center  of 
the  specified  section  and  work  outward  toward  the  ends.  We 
currently  terminate  iteration  under  criterion  3  above  when  two 
eigenpairs  outside  of  the  specified  section  have  converged.  This 
heuristic  is  also  used  for  specifying  the  completed  subsection  under 
termination  conditions  1  and  2.  The  ends  of  the  completed  section 
are  the  averages  of  the  extreme  pairs  of  converged  eigenvalues. 

There  is  a  problem  when  an  'extreme  eigenvalue  (near  an  end  of 
the  specified  section)  is  a  multiple  root.  It  is  not  generally  cost 
effective  to  terminate  an  iteration  (e.g.  due  to  condition  2  above) 
before  all  eigenvectors  of  that  eigenvalue  are  determined.  In  its 
generic  form,  BES  utilizes  both  a  "hard"  limit,  as  in  2.  and  a  "soft" 
limit  which  occasionally  is  exceeded  when  appropriate.  This 
innovation  was  not  incorporated  in  LAllSO  or  KLEA1I. 

1.2  INTERFACE  MODIFICATIONS. 

Most  of  the  interface  code  development  was  routine  and  tedious 
in  nature.  The  primary  interface  routines  that  are  called  by  the 
kernel  code  are  matrix  utilities  and  vector  storage  and  retrieval 
routines.  These  are  simple  relative  to  the  routines  that  establish 
operating  parmeter  values,  and  monitor  and  control  iteration  progress. 
For  small  problems  this  is  not  difficult  but  for  large  problems, 
certain  trade-off  decisions  between  available  RAM  and  working 
storage  allocation  must  be  made  that  have  a  substantial  effect  on 
the  results.  The  interface  code  developed  for  LANSO  and  KLEAN  in 
this  study  was  adequate  for  this  work  but  should  be  extended  for 
production  analysis. 

Additional  interface  software  required  for  this  work  is  con¬ 
cerned  with  postprocessing  the  results  from  the  kernel  code.  The 
exact  nature  of  this  code  varies,  depending  upon  the  nature  of  the 
kernel  code  used.  The  most  dependable  postprocessing  scheme  appears 
to  be  one  step  of  inverse  iteration  followed  by  an  astutely  ordered, 
modified  Gram-Schmidt  orthonormalization.  This,  however,  is  rather 
expensive.  An  alternative  postprocessing  scheme  is  derived  from 
-1  T 
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which  is  the  jth  step  of  the  Lanczos  interation  process.  Post- 
multiplying  (1)  by  the  matrix  S  of  converged  eigenvectors  for  T 
yields  a  simple  expression  for  one  step  of  inverse  iteration  applied 
to  the  Ritz  vectors 


Tests  in  which  thi  inverse  iteration  post* processing  step  wss 
c*rr**d  out  «*ing  the  right  side  of  (1)  indicate  that  this  is  an 
effective  approach. 

2.  TEST  PROBLEMS 

Three  problems  from  typical  engineering  studies  carried  out  at 
Lockheed  Missiles  and  Space  Company  were  used  for  this  study.  The 
nature  of  the  structures  represented  in  these  problems  is  not 
material  and  we  view  them  simply  as  sparse  matrices  for  which 
eigenvalues  are  required.  All  ofr  them  are  of  the  form 

K>:  ■  MxX 

where  K  and  M  are  non- negative  definite  and  M  is  diagonal  with  rani: 
ranging  from  less  than  half  to  about  two-thirds  of  its  size.  All 
tests  on  these  problems  were  run  on  a  Cray  IS  computer  in  single 
pracision.  The  distribution  of  the  eigenvalues  over  a  small  portion 
of  a  generalized  spectrum  is  illustrated  in  Fig.  3.  The  ordinate 


Figure  3.  Spectral  Distribution  for  Problea  3 
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for*  each  eigenvalue  indicates  the  nuobsr  of  sDallsr  sigsnvsluss  that 
exist.  Thus,  the  sharp  rises  in  the  curve  in  Fig.  3  correspond  to 
Bultiple  or  clustered  eigenvalues.  The  results  of  the  numerical 
tests  are  summarized  in  Table  1. 

Unfortunately,  no  results  for  the  large  problems  using  LANSO 
were  obtained.  The  code  mould  function  very  veil  for  20  to  30 
iterations  and  then  fail.  The  problem  turned  out  to  be  that  more 
eigenvalues  would  converge  simultaneously  than  the  code  was  prepared 
to  handle.  This  determination  was  made  so  late  in  the  study  that  it 
was  not  possible  to  include  test  results  in  this  report. 

Note  that  the  CPU  time  for  problem  3  (indicated  in  Table  lb 
by  *)  is  not  available  because  it  represents  a  subsection  analyzed 
as  a  part  of  a  larger  problem.  The  subsections  for  different 
algorithms  were  identical,  however,  making  the  comparisons  valid. 

The  relatively  small  rank  of  these  problems  appears  to  be 
typical  of  engineering  structural  analysis.  This  factor  tends  to 
favor  a  symmetric  formulation  of  the  Lanczos  iteration  matrix  as 
used  in  Scott's  code,  but  the  recorded  CPU  times  do  not  indicate  a 
significant  benefit.  In  fact,  the  iteration  time  per  Lanczos  step 
(It/L)  tends  to  be  smaller  for  the  new  algorithm.  The  factorization 
and  solve  times  for  the  two  methods,  of  course,  are  about  the  same 
because  they  use  the  same  sparse  matrix  processing  utilities.  This 
result  is  probably  due  to  the  relatively  large  amount  of 
reorthogonalization  done  in  Scott's  code. 

Table  lb  also  includes  some  results  comparing  the  automatic 
spectral  shifting  algorithm  in  BES  with  a  hand  selected  shift 
strategy.  The  hand  selection  of  shift  points  requires  that  the 
solution  be  already  known.  The  results  are  encouraging  from  the 
standpoint  that  the  hand  selected  set  reduced  the  total  processing 
time  by  only  21%.  However,  there  is  presently  no  basis  to  claim 
that  the  automatic  shifting  algorithm  used  is  optimal. 

The  new  algorithm  KLEAll  tends  to  iterate  longer  for  each  set  of 
eigem'alues.  The  stopping  criteria  appears  to  be  too  strict  and 
probably  can  be  relaxed.  This  and  the  question  of  when  and  where  to 
do  a  spectral  shift  remain  to  be  settled  in  future  research. 

Problem  2  and  some  smaller  problems  were  run  using  both  the 
standard  inverse  iteration  post-processing  step  and  technique  de¬ 
scribed  in  Sec.  1.2  with  the  KLEAN  kernel.  The  accuracy  of  the 
results  were  similar  and  the  savings  in  overhead  time  for  problem  2 
was  3  seconds  or  6%. 


Table  la.  Legend  for  Table  lb  Column  Headings 
P  -  Problem :  1.  2  or  3 

M  -  Method:  S  -  Scott’s  code.  K  -  KLEA1I ,  L  -  LAliSO 
Size  -  Number  of  degrees  of  freedom  in  the  problem 
Rank  -  Rank  of  the  M  matrix 

Fac.  -  Time  in  seconds  to  factor  (K  -  (shift) *M) 

E  -  Number  of  converged  eigenpairs  found 

Iter.  -  Time  in  seconds  required  for  the  Lanczos  iteration 

L  -  Number  of  Lanczos  steps  taken 

B  -  Block  size  used  for  the  Lanczos  iteration 

It/L  -  Iteration  time  per  Lanczos  step  (Iter./L) 

BL/E  -  Matrix-vector  multiplies  per  converged  eigenpair  (B*L/E) 
Total  -  Total  CPU  seconds  required  for  the  analysis 
OH  -  Percentage  of  the  total  time  used  for  processing  other 
than  factoring  or  iterating  (Overhead) 


Table  lb.  Summary  of  Numerical  Results 
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1. 


Background. 

Ve  are  interested  in  computing  some  of  the  eigenpairs  of  the  generalised 
aigenprohlexa 


Kx  =  \Mx  (•) 

Here  K  and  U  are  real  and  symmetric  matrices,  of  order  n.  Usually  the 
matrices  are  large  and  sparse.  M  is  at  least  positive  semidefinite.  The  eigen¬ 
values  of  interest  usually  lie  in  some  interval  [a,  b],  specified  by  the  user. 

The  problem  may  be  transformed  into 


provided  the  shift,  jx,  has  been  chosen  so  that  K-fiU  is  nonsingular.  One 
efficient  way  to  solve  our  problem  is  to  apply  a  Lanczos  algorithm  to  the  shifted 
and  inverted  problem  (••).  Let  us  for  a  moment  consider  the  simple  Lanczos 
algorithm  (not  a  block  version).  The  Lanczos  algorithm  will  produce  vectors 
and  real  numbers  a*.  fit  (the  are  positive),  such  that  in  exact  arith¬ 
metic  : 


end 


such  that 


Vf-(vl . Vj).  with  VjUV^I 

tx  Pi 
1  <*2 


1  Pi-i 
Pi- i  , 


=  Vj  Tj+PjVj  +  lt/‘ 

(Here  e;-  is  column  number  j  in  the  identity  matrix.) 

Usually  j«n,  typically  n  =  1000-5000  with  j  -  50-100.  Now,  some  of  the 
eigenvalues  of  7)  will  be  close  to  some  eigenvalues  of  problem  (••).  Usually  the 
end-eigenvalues  of  7}  have  the  best  approximation  properties.  If  (s,  u)  is  an 
eigenpair  of  7).  then 

WiK-nMy'MVjS-vVjS  ||#  =  ftlsjl 
(Here  =  ( uTMuj* .) 


it  PM  is  small,  then  (i/.  V^s )  will  be  a  good  approximation  to  an  eigenpair 
of  (**).  Usually  the  product  is  small  because  Is,  |  is  small,  and  not  because  is. 
Since  we  are  really  interested  in  problem  (*),  we  can  transform  the  v- 
eigenvalues  back  to  the  corresponding  A-eigenvalues.  It  is  easy  to  see  that  the 
transformation  is  given  by  n+l/v.  We  said  above,  that  extreme  eigenvalues  of 
7)  have  the  best  approximation  properties,  and  so  they  correspond  to  A- 
eigenvalues  relatively  close  to  fi. 


To  know  when  to  stop  a  Lanczos  run  {j  is  not  usually  determined  before¬ 
hand,  but  should  be  computed  during  the  iteration,  taking  into  account  the 
rate  of  convergence,  cost  of  another  Lanczos  step,  number  of  requested  eigen- 
pairs  etc.),  it  is  essential  to  know  which  eigenpairs  of  Ti  are  good  approxima¬ 
tions.  We  would,  thus,  like  to  have  an  efficient  and  reliable  algorithm  that  com¬ 
putes  the  interesting  (i/,  s)-pairs  in  each  iteration  of  the  Lanczos  algorithm. 

Such  an  algorithm  should  not  start  from  scratch  for  each  7).  but  it  is  desir¬ 
able  to  update  the  spectral  information  computed  for  7}_i.  The  following  figure 
illustrates  this  idea: 

I  s  interesting  eigenvalue  (i.e.  good  approximation) 

+  s  uninteresting  eigenvalue 
X(T)  -  eigenvalues  of  T 

-1-1— I - +—*—+-++-+—+—+ - 1-1 - 1- 

— ].] - ].  x(7)) 

In  this  example  the  three  extreme  eigenvalues,  at  each  end,  have  not  changed 
much,  and  should  therefore  be  easy  to  update.  There  is  one  new  eigenvalue 
(there  must  of  course  be  one  new  eigenvalue,  since  the  order  of  T  has  increased 
by  one),  which  in  this  case  is  an  interesting  one  (marked  “).  The  new  eigenvalue 
need  not,  of  course,  be  interesting.  If  it  comes  in  the  middle,  for  example,  it  is 
not  likely  a  good  approximation. 

This  problem  has  been  attacked  by  B.  Parlett  and  B.  Nour-Omid,  resulting 
in  an  algorithm,  analyze  T. 


2.  The  Block  Case. 

B.  Parlett  and  myself  have  now  looked  into  the  similar  problem  arising  from 
a  run  of  a  block  Lanczos  algorithm.  So  instead  of  producing  vectors  Vy  and 
numbers  at.  the  block  Lanczos  algorithm  will  produce  blocks  of  vectors 
Vt  eRw“*.  and  square  matrices  Ay,  By  elV**  (A*  is  symmetric  and  By  is  lower  tri¬ 
angular).  Let  p  be  a  fixed  integer,  called  the  block-size.  It  is  supplied  by  the 
user,  or  is  set  by  the  program  Typical  values  are  3-7.  Note  thatp  =  1  gives  us 
the  simple  Lanczos  described  above.  Instead  of  the  tridiagonal  matrix,  in  the 
simple  Lanczos,  a  block  tridiagonal  matrix 

By 
A2 


Aj-i  Bj-i 

Bl-i  Ai 

will  result.  (We  use  the  same  notation  for  this  new  matrix.  Note  that  the  order  of 
Tf  i*  »  .)  Since  the  By  are  lower  triangular,  7)  is  a  bandmatrix  having  half 
bandwidth  p .  A  residual  bound  can  be  computed  as  the  norm  of  the  product  of 
the  last  block  and  the  bottom  segment  of  the  eigenvector  (this  corresponds  to 
fij  i*j  I  when  p  =  1). 
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We  are  looking  for  a  new  algorithm  that  solves  essentially  the  same  problem 
as  in  the  simple  case,  !.e.  it  should  monitor  the  behaviour  of  the  extreme  eigen¬ 
values  of  Tj.  now  being  a  block  tridiagonal  matrix.  As  before  we  have  a 
sequence  of  matrices  Tt,  7*. ....  but  for  each  block  Lanczos  iteration,  the  order 
of  Tj  increases  by  p . 

3.  Some  Reasons  Why  the  Block  Case  is  Harder  to  Deal  With 

This  change,  from  p  =  1  to  p  >  1.  may  not  seem  very  dramatic,  but  it  gives 
rise  to  several  new  difficulties,  of  which  some  will  be  presented  below. 


Ifp  =1  (the  are  assumed  to  be  positive,  i.e.  Tj  is  unreduced  (the  Lane- 
sos  algorithm  would  have  been  stopped  if  a  very  small  flj  had  occurred))  T} 
has  distinct  eigenvalues.  Tj  may,  of  course,  have  very  close  eigenvalues 
(even  if  not  any  0,  is  small,  compare  Wilkinson's  famous  example  (The  Alge¬ 
braic  Eigenvalue  Problem,  page  309)).  Whenj>  >  1,  Tj  may  have  truly  multi¬ 
ple  eigenvalues.  This  is,  in  fact,  one  reason  why  the  block  algorithm  is  supe¬ 
rior  for  handling  multiple  eigenvalues  in  the  original  problem  (*). 

If  p  -  1  the  eigenvalues  of  Tj  and  7}+l  strictly  interlace  each  other.  This 
property  gives  rise  to  several  useful  results: 

(1)  It  is  easy  to  find  the  new  eigenvalues  of  7)*,.  since  there  can  only  be 
one  between  two  eigenvalues  of  Tj  (and  we  know  most  of  the  interest¬ 
ing  eigenvalues  of  Tj). 

(2)  We  can  estimate  gaps  between  eigenvalues  (which  implies  that  we  can 
use  more  powerful  error  bounds). 

These  two  properties  fail  when  p  >  1.  Going  from  Tj  to  7)*,  we  may  have 
several  new  eigenvalues  between  two  of  Tj.  In  fact,  using  Cauchy's  interlace 
theorem,  we  can  only  say  that  between  two  eigenvalues  of  Tj  there  may  be 
p  eigenvalues  of  TjfX.  These  new  ones  can  be  grouped  together,  coincide 
with  eigenvalues  of  Tj  (even  though  the  Bi  are  nonsingular).  Since  p  may 
be  rather  large,  the  spectrum  may  have  undergone  large  changes  between 
two  consecutive  calls  of  the  analyse  routine.  Consider  the  following,  little 
example. 


(•)).  If  the  problem  has  eigenvalues  of  high  multiplicity,  or  if  1/0  is  expensive,  it 
may  pay  to  use  a  larger  value  on  p .  A  good  algorithm  should  try  to  take  into 
account  varying  values  of  p.  To  be  specific,  suppose  that  inverse  iteration  is 
used  in  the  analyse  routine.  We  could  consider  an  inverse  iteration  of  the  form: 

Given  a  starting  vector  y ,  and  first  shift  a 

For  k  *  1,  2.  3  ...  until  convergence  do 
j  factor  Tj-ol 

»*V* 

For  i  =  1  to  m*  do 
J  solve  {Tj-oI)z  -b 

b  -•/ II * II  j 

p=p(b)  (  p(b)  is  the  Rayleigh  quotient)  ) 

Vk  +  l  s  & 
o-p  ) 

Taking  m*  =  1.  for  all  k.  gives  the  standard  Rayleigh  quotient  iteration.  It 
may  be  advantageous  to  vary  m*.  If,  for  example,  y,  is  not  too  close  to  the 
requested  eigenvector,  the  first  shift  may  be  rather  poor,  and  it  may  give  faster 
overall  convergence,  if  the  next  shift  is  refined  by  using  several  solves. 

Example. 

In  this  example  some  output,  from  an  algorithm  like  the  one  above,  is 
presented.  Tt  =  diap(  0,  1,2.  2.  3.  8.5,  7,  8,  9.  10  ).  The  first  shift  was  3.6  (a 
rather  poor  shift).  The  starting  vector  was: 

V!  = 

3.7796447e-01  3.7796447e-0l  3.7?98447e-09 

3.7796447e-09  3.7798447e-01  3.7?96447e-01 

3.7796447e-09  3.7796447e-01  3.7?96447e-01 

3.7796447e*01 

One  should  interpret  this  y,  in  the  following  way:  2,  2.  and  7  are  known  eigen¬ 
values.  and  we  have  orthogonalised  the  starting  vector  against  the  known  eigen¬ 
vectors.  We  have  the  same  components  in  the  other  eigenvectors. 

What  follows  below  is  a  printout,  for  a  sequence  of  runs,  where  mk  was  1,  2, 
3.  etc.  Note  that  m*  did  not  vary  with  k.  The  first  column  indicates  what  factor¬ 
isation  is  being  used.  Repeated  values  means  that  we  make  several  solves,  using 
the  same  factorisation.  The  second  column  contains  1/||*!|.  The  third  is  the 
Rayleigh  quotient  formed  after  each  solve.  The  last  column  contains  rho- 
lambda,  the  difference  between  the  Rayleigh  quotient  and  the  requested  eigen¬ 
value  (which  is  3).  The  termination  criterion  was  abs(rho-lambda)  £  le-16  (in 
this  special  case  we  knew  A). 
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1/||*||  p  error 

1  1.4716494e+00  3.1820465e+00  -1.8204650e-01 

2  1.7475323e— 01  2.9997320e+00  2.680 140  le -04 

3  2.680B21  le-04  3.0000000e+00  9.9B01993e-12 

4  9.9601993e— 12  3.0000000e+00  0.  e+00 

§  solves  /  step  =  1 

factorisations  =  4  (total  number  of  factorisations) 
solves  -  4  (total  number  of  solves) 


1/11*11 

1  1.4716494e+00 

1  6.4527501e— 01 

2  1.805691  le-03 

2  1.8008833e— 03 

3  2.2204460e— 15 

#solves  /  step  =  2 
factorisations  =  3 
solves  —  5 


P 

3.1620465e+00 

3.0016009e400 

3.0000000e+00 

3.0000000e+00 

3.0000000e+00 


error 

-1.6204650e-01 
-1.800BB33e-03 
2.6730501e-09 
2.2204460e-15 
0.  e+00 


1/11*11  p  error 

1  1.4716494e+00  3.1620485e+00  -1.6204650e-01 

1  6.4527501e— 01  3.0018009e+00  -l.B00BB33e-03 

1  8.0172374e-01  2.9999564e+00  4.383225Be-05 

2  4.3637928e— 05  3.0000000e+00  l.ll46639e-13 

2  4.363225Be— 05  3.0000000e+00  0.  e+00 


#solves  /  step  = 
factorisations  = 
solves  = 


3 

2 

5 


1/11*11  p  error 

1  1.4716494e+00  3.1620465e+00  -1.6204650e-01 

1  6.4527501e— 01  3.0018009e+00  -1.800BB33e-03 

1  8.0172374e— 01  2.9999564e+00  4.383225Be-05 

1  8.0007432e— 01  2.9999947e+00  5.3417?04e-06 

2  5.341B02Be-06  3.0000000e+00  1.1 102230e-16 

2  5.3417704e-08  3.0000000e+00  0.  e+00 

^solves  /  step  =  4 
factorisations  =  2 
solves  =  8 


1/11*11  p  error 

1  1.4716494e+00  3.1620485e+00  -1.8204850e-01 

1  8.4527501e— 01  3.0016009e+00  -1.6008833e-03 

1  8.0172374e— 01  2.9999564e+00  4.3632258e-05 

1  8.0007432e-01  2.9999947e+00  5.3417704e-06 

1  8.0000347e-01  2.9999996e+00  3.9003350e-07 

2  3.9003362e— 07  3.0000000e+00  0.  e+00 

#  solves  /  step  =5 
factorisations  =  2 
solves  =  6 


1/11*11  p  error 

1  1.4716494e+00  3.1620465e+00  -1.6204650e-01 

1  6.4527501e-01  3.0016009e+00  -1.6008833e-03 

1  8.0172374e— 01  2.9999564e+00  4.3632258e-05 

1  6.0007432e— 01  2.9999947e+00  5.3417704e-06 

1  8.0000347e— 01  2.9999996e+00  3.9003350e-07 

1  6.0000017e— 01  3.0000000e+00  2.52348B0e-08 

2  2.5234B81e-0B  3.0000000e+00  0.  e+00 

#5olves  /  step  =  8 
factorisations  =  2 
solves  =  7 


1/ 1|  s  II  p  error 

1  1.4716494e+00  3.1620465e+00  -1.8204650e-01 

1  6.4527501e— 01  3.0016009e+00  -1.6008833e-03 

1  6.0172374e— 01  2.9999564e+00  4.3832258e-05 

1  8.0007432e-01  2.9999947e+00  5.3417704e-06 

1  6.0000347e— 01  2.9999996e+00  3.9003350e-07 

1  8.0000017e— 01  3.0000000e+00  2.52348B0e-08 

1  B.OOOOOOle— 01  3.0000000e+00  1.54191B4e-09 

2  1.5419184e— 09  3.0000000e+00  0.  e+00 

^solves  /  step  =  7 
factorisations  -  2 
solves  =  8 


factor 


solve  n(2p  +  l)  - p*  4-  3n  •  2n(p+2) 

The  extra  3n-term  in  solve  comes  from  normalising  the  solution,  and  computing 
the  Rayleigh  quotient.  (There  are  other  ways  to  measure  convergence,  not  using 
the  Rayleigh  quotient,  but  we  will  not  discuss  that  here.)  Below  follows  a  table 
over  the  ratio,  factor/solve,  for  varying  p. 


p  factor  /  solve 


1 

0.3333 

2 

0.6250 

3 

0.9000 

4 

1.1667 

5 

1.4286 

6 

1.6875 

7 

1.9444 

B 

2.2000 

9 

2.4545 

10 

2.7083 

11 

2.9615 

12 

3.2143 

13 

3.4667 

14 

3.7188 

15 

3.9706 

20 

5.2273 

25 

6.4815 

30 

7.7344 

For  large  values  of  p .  the  quotient  behaves  like  p/  4. 

Depending  on  p,  different  m*  will  be  optimal.  If,  for  example,  factor  is  more 
expensive  than  solve.  3  solves/factor  is  optimal  (since  it  minimises  the  total 
cost  in  this  example).  Ifp  =1,  usual  RQI  is  the  optimal  choice. 

1  have  done  some  tests  letting  m*  vary  during  one  run,  but  the  results  are 
too  preliminary  to  be  included  here. 


3.1.  Algorithm  1. 

The  first  algorithm  was  a  try  to  generalise  analyse  T  (for  p  =  1)  to  p  >  1.  To 
avoid  the  wealth  of  detail  necessary  to  specify  the  algorithm  in  an  algorithmic 
language,  we  will  only  present  the  basic  strategy  using  a  few  figures. 


I  =  eigenvalue  converged  to  working  accuracy 
•  -  eigenvalue  on  the  way  to  converge 

•f  b  unconverged  eigenvalue  (these  eigenvalues  may  move  around 
quite  a  bit,  and  they  need  not  settle  down) 

X(T)  =  eigenvalues  of  T 

-1-1 — " . — •-! - 1-  \(7)_,) 

— jj - -J-  X(7>) 

We  keep  information  about  the  eigenvalues  (and  corresponding  eigenvectors) 
marked  1  and  ",  but  no  information  about  the  +  eigenvalues.  This  is  a  reason¬ 
able  approach,  since  we  expect  the  end-eigenvalues  to  be  good  approximations. 
The  ones  in  the  middle  are  quite  useless  for  our  purposes.  We  also  keep  informa¬ 
tion  about  the  bounds  for  the  eigenvalues  (this  is  what  gives  the  1,"- 
classification). 

Starting  at  the  left  end  of  the  spectrum  of  7)_,  (we  do  not  know  the  spec¬ 
trum  of  Tj.  That  is  what  we  would  like  to  find  out.),  march  through  the  I  and  * 
eigenvalues,  and  update  (i.e.  refine  the  eigenvalue  and  eigenvector  using 
inverse  iteration)  the  "-eigenvalues.  We  do  not  have  to  refine  the  I-eigenvalues, 
since  they  are  fully  converged.  In  this  process  some  "-eigenvalues  can  become 
1-eigenvalues  (but  not  the  other  way).  In  the  example,  the  one  marked  ~  has 
undergone  this  transformation.  Having  finished  with  the  left  side,  we  do  the 
same  thing  on  the  right  side  (but  in  the  reverse  order).  To  be  able  to  continue 
this  process,  we  must  probe  into  the  unknown  region  in  the  middle  of  the  spec¬ 
trum.  since,  if  do  not  get  any  new  "-candidates,  we  would  end  up  with  a  fixed 
number  of  1-eigenvalues.  The  eigenvalue  marked  *-  is  such  a  new  candidate.  The 
actual  algorithm  starts  at  the  innermost  *  or  1-eigenvalue  (at  each  side)  and 
adds  new  eigenvalues  as  long  as  they  qualify  as  "-eigenvalues. 


3.1.1.  Intruders. 

There  is  one  important  case  which  we  have  not  treated  so  far.  It  is  illus¬ 
trated  in  the  following  figure: 

. *-+ . .  X(7)_i) 

...  -• - • - . . X(7)) 

In  this  case  we  have  a  new  eigenvalue  (marked  **)  between  two  known  ones.  We 
would  detect  this  situation  using  the  eigenvalue  count  we  get  from  the  factori¬ 
sation  (when  doing  the  inverse  iteration).  This  new  eigenvalue  must  be  located 
(we  only  know  an  interval  in  which  it  lies),  and  we  use  a  combination  of  bisection 
and  inverse  iteration. 

One  algorithmic  detail:  To  get  a  clearer  code,  we  represent  the  eigenvalues 
at  the  right  end,  as  the  left  end  eigenvalues  of  the  matrix  -7).  So,  we  deal  with 
two  left  ends. 

Now.  when  p  >  1,  we  can  get  clusters  of  eigenvalues,  multiple  eigenvalues, 
and  several  intruders  between  a  new  pair.  I  did  write  a  FORTRAN  program  fol¬ 
lowing  the  principles  above.  It  worked  quite  well  for  distinct  eigenvalues,  but  not 
so  for  clusters.  We  have  not  given  up  this  algorithm  yet,  but  several  problems 
need  to  be  solved  to  get  a  working  program.  Due  to  these  problems  we  con¬ 
sidered: 


1).2.  Algorithm  2. 

Since  we  are  dealing  with  clusters  it  is  reasonable  to  used  a  method  which 
determines  higher  order  subspaces,  instead  of  simple  eigenvectors.  One  such 
method  is  subspace  iteration.  In  this  second  algorithm  (which  I  will  not  discuss), 
we  replaced  simple  inverse  iteration  by  subspace  iteration.  A  cluster  would  be 
defined  by  having  overlapping  bounds,  as  in  the  following  figure,  where  the  two 
end*eigenvalues  are  isolated,  and  the  three  in  the  middle  form  a  cluster. 

- - . [ . •  -] 

[...]  marks  the  interval  Xiboimd. 

One  worry  at  this  point,  was  that  the  code  was  becoming  quite  long.  In  par¬ 
ticular  we  had  problems  with  the  part  finding  intruders.  It  is,  in  fact,  the  com¬ 
pletely  general  problem:  Given  an  interval  [a,  b]  compute  a  known  number  of 
eigenvalues  in  the  interval.  This  lead  us  to  look  at  yet  another  algorithm  : 


3.3.  Algorithms. 

This  is  the  simplest  algorithm  we  have  considered,  and  it  is  quite  expensive. 
The  idea  is  to  first  reduce  Tj  to  tridiagonal  form.  This  can  be  made  using 
orthogonal  rotations.  (Starting  with  the  first  row,  zero  the  right-most  element, 
this  will  give  rise  to  a  new  element,  outside  the  band.  This  new  element  is  chased 
over  the  edge  with  a  sequence  of  rotations.  Then  zero  the  next  element  in  the 
first  row  etc.  Continue  this  process  row-wise.)  It  is  quite  an  expensive  algorithm, 
the  operation  count  is  s  J$n8(p -l)(4+ 13/ 2p),  where  n  =  jip  as  usual.  (Note  that 
it  is  a  n8p  process,  factor  and  solve  are  np8  and  np  processes.) 

Having  made  the  reduction,  we  compute  all  the  eigenvalues  of  the  tridiago¬ 
nal  matrix.  This  is  considerable  cheaper  than  the  initial  reduction  (at  least  on 
computers  with  standard  architecture.  According  to  B.  Nour-Omid,  House¬ 
holder  reduction  on  a  full  matrix,  on  a  CRAY,  is  comparable  in  time  to  finding 
the  eigenvalues  of  the  tridiagonal).  We  would  save  the  eigenvalues  from  the  pre¬ 
vious  step  (so  A(7'J_j)  and  A  (Tj).  are  available).  Now  we  would  make  a  com¬ 
parison  between  the  two  sets  of  eigenvalues.  If  an  eigenvalue  of  7)  is  close  to 
any  of  T^_,t  we  would  compute  the  corresponding  eigenvector  using  inverse 
iteration.  If  the  eigenpair  is  a  good  one  (we  compute  a  bound  using  Bj  and  the 
eigenvector)  it  is  saved,  otherwise  it  is  discarded.  Sometimes  we  would  already 
have  an  approximation  of  the  eigenpair.  and  it  would  suffice  to  update  the 
eigenvector.  To  make  this  strategy  work  well  in  presence  of  clusters  and  multi¬ 
ple  eigenvalues,  we  would  need  a  carefully  coded  inverse  iteration.  (It  could,  for 
example,  orthogonalise  the  starting  vector  against  known  eigenvectors  (having 
eigenvalues  close  by),  and  it  might  check  the  orthogonality  of  the  resulting 
eigenvector  to  the  previously  computed.) 

4.  Other  Alternatives. 

We  have  also  considered  some  other  alternatives.  One  that  does  not  work 
(unfortunately)  is  the  following:  Given  7)  in  tridiagonal  form  (and  the  transfor¬ 
mation  matrix  giving  the  reduction),  perform  a  few  operations  to  get  7)+1  (given 
Bj  +  i  and  A/  +  i)  in  tridiagonal  form.  The  cost  of  doing  this  seems  to  be  equal  to 
starting  from  scratch  with  7}  +  j. 

Another  alternative,  not  very  cheap,  is  to  perform  the  reduction  in  each 
step  (from  scratch),  and  then  feed  the  resulting  tridiagonal  (starting  at  step 
jp  + 1)  to  analyze  T.  The  main  advantage  is  the  extreme  simplicity,  essentially 
two  subroutine  calls. 


*  *  "Other  solutions  would  be  to  use  the  Lanczos  algorithm,  and  preferably  the 

block  algorithm,  to  solve  the  subproblem,  but  it  tends  to  give  a  lot  of  code  (or 
demand  recursion). 


We  are  presently  working  on  ways  to  change  algorithm  No.  1  to  cope  with 
clusters  and  other  problems,  and  we  hope  that  these  efforts  will  result  in  a  fast 
and  reliable  algorithm. 


